Confusion Matrix

Performance measures

= the fraction of the total samples that were correctly classified by the classifier

(T P + T N) / (T P + T N + F P + F N)

= the fraction of predictions were incorrect

(F P + F N) / (T P + T N + F P + F N) $ $ o r $ $ (1 - A c c u r a c y)

= the fraction of predictions as a positive class were actually positive

T P / (T P + F P)

= the fraction of all positive samples were correctly predicted as positive by the classifier

T P / (T P + F N)

= the fraction of all negative samples are correctly predicted as negative by the classifier

T N / (T N + F P)

Interesting

In clinical studies, accuracy can be seen as a weighted sum of sensitivity and specificity:

accuracy = sensitivity x prevalence + specificity x (1 - prevalence)

where prevalence represents the probability of a disease (positive).

= It combines precision and recall into a single measure. Mathematically it’s the harmonic mean of precision and recall (range in $[0, 1]$ )

F 1 s c o r e = 2 \times \frac{P r e c i s i o n \times S e n s i t i v i t y}{P r e c i s i o n + S e n s i t i v i t y}

The Hundred-Page Machine Learning Book

How to choose between precision and sensitivity

by assigning a higher weighting to the examples of (the SVM algorithm accepts weightings of classes as input)
by tuning hyperparameters to maximize precision or recall on the validation set
by varying the decision threshold for algorithms that return probabilities of classes; for instance, if we use logistic regression or decision tree, to increase precison

CAP is used to visualize the discriminative power of a model.
The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis.

We can analyze the cap curve in 2 ways.

ROC != CAP
ROC plots the true-positive rate (Confusion Matrix#Sensitivity / True Positive Rate (TPR) / Probability of Detection / Recall) against the false-positive rate, where false-positive rate is the proportion of negative examples predicted incorrectly:$$FP/(FP+TN)$$

ROC AUC = ROC Area Under the Curve

Delong's test
- to compare two AUCs derived from the same dataset for a binary classification
- nonparametric approach: it uses a rank-based method to compute variances and covariances of AUC estimates
- The test statistic ( $z$ ) is given by:

z = \frac{{AUC}_{1} - {AUC}_{2}}{\sqrt{Var ({AUC}_{1}) + Var ({AUC}_{2}) - 2 \cdot Cov ({AUC}_{1}, {AUC}_{2})}}